-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pandas.to_datetime() does not respect exact format string with ISO8601 #49333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8a11b2c
to
d834750
Compare
Co-Authored-By: MarcoGorelli <> Co-Authored-By: FDRocha <>
d834750
to
45f82c3
Compare
#define FORMAT_STARTSWITH(ch) \ | ||
if (exact) { \ | ||
if (!format_len || *format != ch) { \ | ||
goto parse_error; \ | ||
} \ | ||
++format; \ | ||
--format_len; \ | ||
} else { \ | ||
if (format_len > 0) { \ | ||
if (*format != ch) { \ | ||
goto parse_error; \ | ||
} \ | ||
++format; \ | ||
--format_len; \ | ||
} \ | ||
} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super minor nit: maybe this could be a bit easier to parse?
// Always error on character mismatch conditioned on non-exhausted format, or when format is exhausted in the exact case.
if ((format_len && *format != ch) || (exact && !format_len)) goto parse_error;
// Advance if format is not exhausted
if (format_len) {
++format;
--format_len;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your suggestion! are you sure this is right though? if exact=False
and we're at the end of the format string, won't *format != ch
resolve to True
and so go to the parse_error
? At least, if I try this, then I get a few test failures, e.g.
(.venv) megorelli@penguin:~/pandas-dev$ cat f.py
import pandas as pd
pd.to_datetime(['2022-01-01'], format='%Y-%m', exact=False)
(.venv) megorelli@penguin:~/pandas-dev$ python f.py
Traceback (most recent call last):
File "/home/megorelli/pandas-dev/f.py", line 3, in <module>
pd.to_datetime(['2022-01-01'], format='%Y-%m', exact=False)
File "/home/megorelli/pandas-dev/pandas/core/tools/datetimes.py", line 1130, in to_datetime
result = convert_listlike(argc, format)
File "/home/megorelli/pandas-dev/pandas/core/tools/datetimes.py", line 441, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(
File "/home/megorelli/pandas-dev/pandas/core/arrays/datetimes.py", line 2205, in objects_to_datetime64ns
result, tz_parsed = tslib.array_to_datetime(
File "pandas/_libs/tslib.pyx", line 440, in pandas._libs.tslib.array_to_datetime
cpdef array_to_datetime(
File "pandas/_libs/tslib.pyx", line 622, in pandas._libs.tslib.array_to_datetime
raise ValueError(
ValueError: time data "2022-01-01" at position 0 doesn't match format "%Y-%m"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format exhaustion is not tested outside of the macro, right? Maybe iterating over the format string is better than over the original string in the external loop, especially for the case when exact=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli , I have updated the check so that *format != ch
is conditioned on format_len > 0
. Would that work now? I would still argue that changing this whole function to iterate over format is a better approach, but I might be missing some use cases for which this is not true...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's cleaner, thanks!
Regarding iterating over format
- not sure that'd be possible, because there are parts of the codebase which reach here without passing format
. An option could be to do something like:
- if
format == ''
andnot exact
then use the current path, - else if
format != ''
, then iterate overformat
until it's exhausted, then use the current codepath
but I think that would add even more complexity (if I've understood correctly)
pandas/_libs/tslib.pyx
Outdated
@@ -445,6 +445,8 @@ cpdef array_to_datetime( | |||
bint utc=False, | |||
bint require_iso8601=False, | |||
bint allow_mixed=False, | |||
str format="", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use None as a default rather than ""
? I think can just type as Optional[str] which would be a bit cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, could do that here, but then wouldn't it still need to be str
when it's passed to the C function in
pandas/pandas/_libs/tslibs/np_datetime.pyx
Lines 287 to 290 in 45f82c3
format_buf = get_c_string_buf_and_size(format, &format_length) | |
return parse_iso_8601_datetime(buf, length, want_exc, | |
dts, out_bestunit, out_local, out_tzoffset, | |
format_buf, format_length, exact) |
? If I understand correctly, at some point the conversion from None
to ""
would have to happen anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If None can be interpreted as nullptr (can we just pass 0
?), I agree that having None is a cleaner option for something that is user-facing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't user-facing, it's just used internally
Not sure I understand about None being interpreted as nullptr, but if I pass None
to get_c_string_buf_and_size
I get
Traceback (most recent call last):
File "t.py", line 109, in <module>
print(to_datetime('2012-01-01T10:00', format='%Y-%m-%dT%H:%M', exact=False))
File "/home/marcogorelli/pandas-dev/pandas/core/tools/datetimes.py", line 1117, in to_datetime
result = convert_listlike(np.array([arg]), format)[0]
File "/home/marcogorelli/pandas-dev/pandas/core/tools/datetimes.py", line 441, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(
File "/home/marcogorelli/pandas-dev/pandas/core/arrays/datetimes.py", line 2177, in objects_to_datetime64ns
result, tz_parsed = tslib.array_to_datetime(
File "pandas/_libs/tslib.pyx", line 440, in pandas._libs.tslib.array_to_datetime
cpdef array_to_datetime(
File "pandas/_libs/tslib.pyx", line 706, in pandas._libs.tslib.array_to_datetime
raise
File "pandas/_libs/tslib.pyx", line 606, in pandas._libs.tslib.array_to_datetime
string_to_dts_failed = string_to_dts(
File "pandas/_libs/tslibs/np_datetime.pyx", line 288, in pandas._libs.tslibs.np_datetime.string_to_dts
format_buf = get_c_string_buf_and_size(None, &format_length)
File "pandas/_libs/tslibs/util.pxd", line 221, in pandas._libs.tslibs.util.get_c_string_buf_and_size
return PyUnicode_AsUTF8AndSize(py_string, length)
TypeError: bad argument type for built-in operation
So I think the conversion between None
and str
needs to happen at some point internally anyway
If exact
if False and format
is ''
, then it should match anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about?
if format is None:
# Make format_buf a nullptr
format_buf = 0
format_lenght = 0
else:
format_buf = get_c_string_buf_and_size(..., &format_length)
...
return parse_iso...
The rest of the code should work just fine given that the checks are conditioned on format_lenght
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah, looks good - I presume format_buf
would have to be '\0'
rather than 0
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as format_len
is 0
, *format_buf
could be anything, so it is a bit clearer to have the pointer set to NULL
, imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how though? because if it's 0
, then we get
[1/1] Cythonizing pandas/_libs/tslibs/np_datetime.pyx
Error compiling Cython file:
------------------------------------------------------------
...
buf = get_c_string_buf_and_size(val, &length)
format_buf = get_c_string_buf_and_size(format, &format_length)
return parse_iso_8601_datetime(buf, length, want_exc,
dts, out_bestunit, out_local, out_tzoffset,
0, 0, exact)
^
------------------------------------------------------------
pandas/_libs/tslibs/np_datetime.pyx:290:35: Cannot assign type 'long' to 'const char *'
And with None
:
Error compiling Cython file:
------------------------------------------------------------
...
buf = get_c_string_buf_and_size(val, &length)
format_buf = get_c_string_buf_and_size(format, &format_length)
return parse_iso_8601_datetime(buf, length, want_exc,
dts, out_bestunit, out_local, out_tzoffset,
None, 0, exact)
^
------------------------------------------------------------
pandas/_libs/tslibs/np_datetime.pyx:290:35: Cannot assign None to const char *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, it is a cast issue. Does Cython support casts to C types? If so, something like format_buf = (const char*) 0
should do. Or, maybe, we can specify the type of format_buf
in advance... Sorry, I am not an expert in Cython :)
Maybe this will work?
if format is None:
format_buf: const char* = 0
format_len = 0
else:
format_buf = get_c_string_buf_and_size(val, &format_len)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried
format_buf = <const char*> 0
but then get
Segmentation fault
(with no further info)
For now I've gone with format = b''
, and handling None
inside this function does indeed look cleaner, thanks!
@@ -66,10 +66,24 @@ This file implements string parsing and creation for NumPy datetime. | |||
* | |||
* Returns 0 on success, -1 on failure. | |||
*/ | |||
|
|||
#define FORMAT_STARTSWITH(ch) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this a function instead of a macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro modifies two variables and has access to the goto location. A separate function then has to have at least two variables accepted with one of them being a pointer to a pointer. It also has to return a flag that needs a check every single time it is called. We can remove this pointer to a pointer, however, and modify just an offset variable, but checks are unavoidable. Is that a preferred approach? What do you think about checks being potentially redirected to the goto location in a macro? Or we could refactor the code and remove the goto once and for all, and handle the error message in a macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also just do one function that performs the check and in the caller run the goto in the failing branch while advancing the pointers in the succeeding branch. It is more verbose but also more explicit than the macro.
This is vendored from numpy - can you also check if there's something like this upstream? If not may be worth doing there as well |
to_datetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review - I've changed the macro to a function
Regarding the numpy function this is vendored from - that one's a lot stricter, it only allows -
as a separator
whereas pandas allows multiple separators (so long as they're consistent)
char valid_ymd_sep[] = {'-', '.', '/', '\\', ' '}; |
so I don't think this would be useful to them
Regarding whether the pandas function should become stricter - not sure, it's not user-facing anyway, it just gives users a fast-path for ISO8601-like date strings
pandas/_libs/tslib.pyx
Outdated
@@ -445,6 +445,8 @@ cpdef array_to_datetime( | |||
bint utc=False, | |||
bint require_iso8601=False, | |||
bint allow_mixed=False, | |||
str format="", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, could do that here, but then wouldn't it still need to be str
when it's passed to the C function in
pandas/pandas/_libs/tslibs/np_datetime.pyx
Lines 287 to 290 in 45f82c3
format_buf = get_c_string_buf_and_size(format, &format_length) | |
return parse_iso_8601_datetime(buf, length, want_exc, | |
dts, out_bestunit, out_local, out_tzoffset, | |
format_buf, format_length, exact) |
? If I understand correctly, at some point the conversion from None
to ""
would have to happen anyway
@@ -66,10 +66,25 @@ This file implements string parsing and creation for NumPy datetime. | |||
* | |||
* Returns 0 on success, -1 on failure. | |||
*/ | |||
|
|||
inline int format_startswith(char ch, int format_len, char format, int exact) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure about this function. What is the advantage of this over using strncmp
? Seems like it would be cleaner to just use that plus some combination of exact
outside of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think I understand a bit more now what you are trying to do. I think it would be best if you just kept a local variable for characters consumed, which you can increment every time you increment the format pointer (you are kind of doing this anyway). You don't really need a function but can rather just do if (format++ != '%') { // handle error }
The order of operations will do the postfix increment after assignment, so this consolidates what you are trying to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking I've understood, is the suggestion to get rid of the function and each time do something like
if (((format_len > 0) && (format++ != '%')) || (exact && (format_len<=0))){
if (format_len > 0) format_len--;
goto parse_error;
}
?
I presume I've misunderstood because to me this looks more complicated and loses the docstring 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to even do everything in one single if statement if ((exact && !format_len) || (format_len && format_len-- && *format++ != ch)) goto ...
, but it then starts to look as one of these tricky C language questions... This explicit function has comments, it is more clear imho, and is very likely to get inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a matter of inlining as much as code organization. For instance, this block:
if (!format_startswith('%', format_len, *format, exact)) goto parse_error;
if (format_len) {++format; --format_len;}
if (!format_startswith('Y', format_len, *format, exact)) goto parse_error;
if (format_len) {++format; --format_len;}
Would seem more naturally expressed as:
if (format_len < 2 && exact) {
goto parse_error;
}
if (*format++ != '%') {
goto parse_error;
}
if (*format++ != 'Y') {
goto parse_error;
}
There's some code golf you can do therein to shorten it but that's not really what I'm after. Moreso we should refactor this to avoid trying to cram a lot into a function with side effects, as we don't use that pattern much if at all in our code base, and a lot of the character by character checks being done are overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other advantage of this is you could provide an actual error message as to what went wrong at what position of the format, though not critical to scope for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Can't we just refactor everything and replace it with something like
- match given format with the full format and return position of mismatch/success that depends on lengths and exactness.
- Parse the string with sprintf or something like this, with additional validity checks. Provided that 1 is a success
?
Unless doing everything in one go is a performance decision...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you see a better way to refactor I am definitely open to it. I don't think the performance here will be that critical.
Probably better served as a pre cursor before adding functionality here, or alternately as a follow up to the "side-effect-less" design mentioned already
@@ -104,19 +119,28 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc, | |||
while (sublen > 0 && isspace(*substr)) { | |||
++substr; | |||
--sublen; | |||
if (!format_startswith(' ', format_len, *format, exact)) goto parse_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit but can you run clang-format
against this file? I think it would format this differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making really nice progress here. Mostly a suggestion on branching layout that I think can be applied to all cases
@@ -104,19 +105,30 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc, | |||
while (sublen > 0 && isspace(*substr)) { | |||
++substr; | |||
--sublen; | |||
if (format_len < 1 && exact) goto parse_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (format_len < 1 && exact) goto parse_error; | |
if (format_len < 1 && exact) { | |
goto parse_error; | |
} |
I think clang-format will automatically enforce this style for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried clang-format
and it formatted it without the braces, i.e.
if (format_len < 1 && exact)
goto parse_error;
so I've gone with that for now
} | ||
|
||
if (sublen == 0) { | ||
goto parse_error; | ||
} | ||
|
||
/* PARSE THE YEAR (4 digits) */ | ||
if (format_len < 2 && exact) goto parse_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (format_len < 2 && exact) goto parse_error; | |
if (format_len < 2) { | |
if (exact) { | |
goto parse_error; | |
} | |
// What happens here? | |
} else { | |
if (*format++ != '%' && *format++ != 'Y') { | |
goto parse_error; | |
} | |
format_len -= 2; | |
} |
Current code has undefined behavior and a possible segfault if format_len
is 1 while exact
is False. Need to think through what action should be taken there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice that shouldn't be possible - this function's only used internally, and
pandas/pandas/_libs/tslibs/parsing.pyx
Lines 886 to 903 in 57d8d3a
def format_is_iso(f: str) -> bint: | |
""" | |
Does format match the iso8601 set that can be handled by the C parser? | |
Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different | |
but must be consistent. Leading 0s in dates and times are optional. | |
""" | |
iso_template = '%Y{date_sep}%m{date_sep}%d{time_sep}%H:%M:%S{micro_or_tz}'.format | |
excluded_formats = ['%Y%m%d', '%Y%m', '%Y'] | |
for date_sep in [' ', '/', '\\', '-', '.', '']: | |
for time_sep in [' ', 'T']: | |
for micro_or_tz in ['', '%z', '%Z', '.%f', '.%f%z', '.%f%Z']: | |
if (iso_template(date_sep=date_sep, | |
time_sep=time_sep, | |
micro_or_tz=micro_or_tz, | |
).startswith(f) and f not in excluded_formats): | |
return True | |
return False |
would return false if the format were for example %Y-%
. I've changed it to raise if format_len
isn't 0
anyway, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea what you will find with C is that you want to be extremely explicit. Even if it's not possible today, if someone refactors the code and it does happen things become very difficult to debug.
Our error checking in some of our extensions is really loose, which is why our internal code is relatively difficult to refactor
@@ -104,19 +105,30 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc, | |||
while (sublen > 0 && isspace(*substr)) { | |||
++substr; | |||
--sublen; | |||
if (format_len < 1 && exact) goto parse_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (format_len < 1 && exact) goto parse_error; | |
if (format_len < 1) { | |
if (exact) { | |
goto parse_error; | |
} | |
// Do something? | |
} else { | |
if (*format++ != ' ') { | |
goto parse_error; | |
} | |
format_len--; | |
} |
I think this control flow would work more universally and avoid logic pitfalls / segfaults (see other comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @WillAyd for your review!
} | ||
|
||
if (sublen == 0) { | ||
goto parse_error; | ||
} | ||
|
||
/* PARSE THE YEAR (4 digits) */ | ||
if (format_len < 2 && exact) goto parse_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice that shouldn't be possible - this function's only used internally, and
pandas/pandas/_libs/tslibs/parsing.pyx
Lines 886 to 903 in 57d8d3a
def format_is_iso(f: str) -> bint: | |
""" | |
Does format match the iso8601 set that can be handled by the C parser? | |
Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different | |
but must be consistent. Leading 0s in dates and times are optional. | |
""" | |
iso_template = '%Y{date_sep}%m{date_sep}%d{time_sep}%H:%M:%S{micro_or_tz}'.format | |
excluded_formats = ['%Y%m%d', '%Y%m', '%Y'] | |
for date_sep in [' ', '/', '\\', '-', '.', '']: | |
for time_sep in [' ', 'T']: | |
for micro_or_tz in ['', '%z', '%Z', '.%f', '.%f%z', '.%f%Z']: | |
if (iso_template(date_sep=date_sep, | |
time_sep=time_sep, | |
micro_or_tz=micro_or_tz, | |
).startswith(f) and f not in excluded_formats): | |
return True | |
return False |
would return false if the format were for example %Y-%
. I've changed it to raise if format_len
isn't 0
anyway, thanks
@@ -104,19 +105,30 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc, | |||
while (sublen > 0 && isspace(*substr)) { | |||
++substr; | |||
--sublen; | |||
if (format_len < 1 && exact) goto parse_error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried clang-format
and it formatted it without the braces, i.e.
if (format_len < 1 && exact)
goto parse_error;
so I've gone with that for now
.pre-commit-config.yaml
Outdated
'--extensions=c,h', | ||
'--headers=h', | ||
--recursive, | ||
'--filter=-readability/casting,-runtime/int,-build/include_subdir,-readability/fn_size' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_iso_8601_datetime
is now more than 500 non-comment lines, so I've turned off that check for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate to this but I would also be fine exploring another tool aside from cpplint. I haven't seen many other projects use this tool, particularly for C instead of C++. clang has a formating and static analyzer tool that would likely be more useful
@@ -104,19 +107,44 @@ int parse_iso_8601_datetime(const char *str, int len, int want_exc, | |||
while (sublen > 0 && isspace(*substr)) { | |||
++substr; | |||
--sublen; | |||
if (format_len < 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK looking great. Now I think we've established a pretty clear pattern for a function. So with inputs of:
- format string
- format length remaining
- characters to compare
- whether exactness matters or not
I think you can now convert this back into the reusable function you always wanted. Something like:
// This function will advance the pointer on format
// and decrement characters_remaining by n on success
// On failure will return -1 without incrementing
int compare_format(const char **format, int *characters_remaining,
const char *compare_to, int n, const int exact) {
if (*characters_remaining < n) {
if (exact) {
// TODO: in the future we should set a PyErr here
// to be very clear about what went wrong
return -1;
} else {
// TODO: same return value in this function as
// above branch, but stub out a future where
// we have a better error message
return -1;
}
}
else {
if (!strncmp(*format, compare_to, n)) {
// TODO: PyErr to differentiate what went wrong
return -1;
} else {
*format += n;
*characters_remaining -= n;
return 0;
}
}
}
Calling this like compare_format(&format, &format_len, '%y', 2, check)
should do the mutation you are looking for and give us a better pattern to set explicit error messages about what went wrong. Ideally would do this along with this function, but I think in a follow up it would be better if we replaced the goto
with a -1
return value + setting a PyError, as that is a common practice in CPython development
Thanks @WillAyd for your review, I've learned a few new things! I wasn't sure how to pass a character (like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work lgtm @mroeschke
Just a merge conflict otherwise very nice! |
…er/format_iso Co-authored-by: fdrocha <> Co-authored-by: nikitaved <>
> > Co-authored-by: fdrocha <> Co-authored-by: nikitaved <>
Thanks Will and Matthew for your reviews/approvals! Let's ship this then, and thanks @nikitaved and @fdrocha for the collaboration |
Thank you very much, @MarcoGorelli, @fdrocha , @WillAyd , @mroeschke , for this amazing experience! I would really love to contribute more :) |
…SO8601 (pandas-dev#49333) * initial format support Co-Authored-By: MarcoGorelli <> Co-Authored-By: FDRocha <> * set exact=False default in objects_to_datetime * 🏷️ typing * simplify * replace macro with function * clean up * 📝 restore docstring * inline * set format default to None * clean up * remove function, perform check inline * only compare *format++ if format_len * clean up * typing * split out branches * use compare_format function * remove tmp variable * Add co-authors > > Co-authored-by: fdrocha <> Co-authored-by: nikitaved <> Co-authored-by: Marco Gorelli <>
…SO8601 (pandas-dev#49333) * initial format support Co-Authored-By: MarcoGorelli <> Co-Authored-By: FDRocha <> * set exact=False default in objects_to_datetime * 🏷️ typing * simplify * replace macro with function * clean up * 📝 restore docstring * inline * set format default to None * clean up * remove function, perform check inline * only compare *format++ if format_len * clean up * typing * split out branches * use compare_format function * remove tmp variable * Add co-authors > > Co-authored-by: fdrocha <> Co-authored-by: nikitaved <> Co-authored-by: Marco Gorelli <>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.EDIT (by MarcoGorelli): this is something @nikitaved, @fdrocha, and I started working on together at a company retreat